Impact of measurement method on interobserver variability of apparent diffusion coefficient of lesions in prostate MRI

Purpose To compare the inter-observer variability of apparent diffusion coefficient (ADC) values of prostate lesions measured by 2D-region of interest (ROI) with and without specific measurement instruction. Methods Forty lesions in 40 patients who underwent prostate MR followed by targeted prostate biopsy were evaluated. A multi-reader study (10 readers) was performed to assess the agreement of ADC values between 2D-ROI without specific instruction and 2D-ROI with specific instruction to place a 9-pixel size 2D-ROI covering the lowest ADC area. The computer script generated multiple overlapping 9-pixel 2D-ROIs within a 3D-ROI encompassing the entire lesion placed by a single reader. The lowest mean ADC values from each 2D-small-ROI were used as reference values. Inter-observer agreement was assessed using the Bland-Altman plot. Intraclass correlation coefficient (ICC) was assessed between ADC values measured by 10 readers and the computer-calculated reference values. Results Ten lesions were benign, 6 were Gleason score 6 prostate carcinoma (PCa), and 24 were clinically significant PCa. The mean±SD ADC reference value by 9-pixel-ROI was 733 ± 186 (10−6 mm2/s). The 95% limits of agreement of ADC values among readers were better with specific instruction (±112) than those without (±205). ICC between reader-measured ADC values and computer-calculated reference values ranged from 0.736–0.949 with specific instruction and 0.349–0.919 without specific instruction. Conclusion Interobserver agreement of ADC values can be improved by indicating a measurement method (use of a specific ROI size covering the lowest ADC area).

Introduction Assessment of diffusion-weighted images (DWIs) and apparent diffusion coefficient (ADC) images plays an important role in classifying prostate lesions by the Prostate Imaging Reporting and Data System (PI-RADS) scoring system. The scoring system relies on qualitative visual assessment and subjective determination of markedly hypointense signal on ADC. The DWI score drives the overall assessment score for peripheral zone (PZ) lesions and influences it for transitional zone (TZ) lesions [1][2][3][4][5][6]. ADC values have been shown to be the most useful image marker of prostate carcinoma (PCa) aggressiveness [7][8][9][10][11][12]. A meta-analysis showed that the correlation between ADC values and Gleason score (GS) was moderate (-0.48) in PZ cancer and mild (-0.22) in TZ cancer [8]. ADC values may assist differentiation between benign and malignant prostate tissue in the PZ using a threshold of 700 to 900 (10 −6 mm 2 /s) [3,7]. However, quantitative ADC threshold values are not specified in the PI-RADS scoring system since large variability of ADC values exists between magnetic resonance imaging (MRI) scanner models and also given the inconsistency in how a region of interest (ROI) is placed [13][14][15][16]. Standardization of ADC measurement should improve inter-observer variability and possibly accuracy of prostate MRI.
Some studies revealed that 0 th to 10 th percentile ADC values of 3D-whole-lesion-ROI were accurate in diagnosing clinically significant PCa (csPCa: GS �7) [9][10][11]. The major drawback of this method is that it takes a longer time to draw 3D-ROI than 2D-ROI and also requires a histogram calculation, which is not universally available at all reading stations. Therefore, radiologists usually use 2D-ROI to measure ADC values in the clinical setting. However, previous literature reported lower interobserver agreement and lower diagnostic performance of the 2D-ROI method compared to the 3D-ROI method [9,16]. This could be partly explained by the volatility of ADC values measured by 2D-ROI. We assume that the proportion of aggressive tumor components with lower ADC values could change depending on the size of the 2D-ROI.
The purpose of our multireader study was to compare the inter-observer variability of ADC values measured by 2D-ROI with and without the use of a specific ROI size covering the lowest ADC area. The variability of ADC values measured with different scanner models was also evaluated to see how its magnitude differs from that of interobserver variability.

Materials and methods
Our institutional review board approved this Health Insurance Portability and Accountability Act (HIPAA)-compliant, retrospective study. All patients had previously consented to the use of their medical records for research.

Patient enrollment
The details of patient enrollment are shown in Fig 1. Records from a total of 466 patients with 613 lesions amongst them who underwent prostate MRI followed by targeted prostate biopsy without known csPCa clinically significant prostate carcinoma (csPCa: Gleason score [GS] �7) prior to MRI) were retrieved. From those, 40 patients, each with 1 lesion (40 total lesions), were randomly selected for a multireader study. The entire cohort was used to compare ADC values taken from different scanner models.

Histopathologic examination
An MRI/US fusion trans-rectal biopsy was performed for the lesions identified on MRI. GS and the location of each lesion were obtained from the official surgical and pathologic reports.

Multi-reader study
Ten board-certified radiologists (A.K., B.K., N.L., N.T., R.P.H., M.T.H., A.K., A.T.F., C.W.B., and D.A.A. with +20, +20, 20, 20, 18, 14, 11, 10, 10 and 2 years of experience in abdominal radiology) independently measured the ADC values of the lesions using 3 different methods during 3 different sessions. The readers were blinded to the pathologic diagnosis or PI-RADS score. For each case, the lesion was marked with arrows on an axial T2-weighted image to assist reader with lesion detection. The measurement was performed on Visage 7 (Visage Imaging Inc) and/or Invivo DynaCAD (GE Healthcare) (sessions 1 and 2) or Invivo Dyna-CAD (session 3). Mean ADC value and ROI size were reported in sessions 1 and 2. ADC values of the 10 th percentile were reported in session 3.
Session 1: Free 2D-small-ROI measurement. Readers measured ADC values by placing a 2D-small-ROI on a single slice as they would usually do in the clinical setting. No other specific instructions were given. Readers were not aware of ROI size used in session 2.
Session 2: 2D-small-ROI method with 9 pixels covering lowest ADC area. Readers placed a 2D-small-ROI on a single slice in the area covering the lowest ADC area on Visage 7. ROI size was specified as area (range +/-mm 2 ), which corresponds to the 9 pixels. Session 3: 10 th percentile of 3D-whole-lesion-ROI measurement. Readers placed a 3Dwhole-lesion-ROI encompassing the entire area of visually low on Invivo DynaCAD. The 10 th percentile of ADC values was calculated from 3D-whole-lesion-ROI.

Computed-based calculation for ADC values of lesion
3D-whole-lesion-ROI placement. For each lesion, 3D-whole-lesion-ROI encompassing the entire area of the lesion with visually low ADC values was placed on the ADC images by 1 of 4 board-certified radiologists (A.K., N.L., K.Y. and H.T. with 20+, 20, 11 and 2 years of experience in abdominal radiology) (Fig 2a), using RIL contour (Mayo Clinic) [17]. The radiologists were aware of the location and MRI description of the lesion and were allowed to refer to the sequences other than the ADC images. Two of 4 radiologists also contributed to the multireader study; their reader study was performed after at least 3 months from 3D-whole-lesion-ROI placement to mitigate recall bias.
Mean ADC value of lesion by 2D-small-ROI with different pixel sizes. Multiple overlapping 2D-small-ROIs of a certain size were automatically generated within the 3D-wholelesion-ROI using computer script (Fig 2b). The mean ADC value of pixels of each of the 2Dsmall-ROIs was calculated. Among the multiple mean ADC values, the lowest mean ADC value was used as a representative value for each ROI size. This process was repeated for 9 different 2D-small-ROI sizes (1,2,3,5,9,17,25,33, and 49 pixels, square-or near-circular-ROI). If no single 2D-small-ROI of a certain size fit within the 3D-whole-lesion-ROI, the representative value of 1 size smaller ROI was used. The mean ADC value by 2D-small-ROI with 9 pixels was used as reference value for the multi-reader study.
10 th percentile ADC value of lesions by 3D-whole-lesion-ROI. A histogram of the ADC values within the 3D-whole-lesion-ROI was created (Fig 2c). ADC values for the 10 th percentile were calculated and used as the reference value for the multi-reader study. On the ADC image, 3D-whole-lesion-ROI was placed encompassing the entire area of visually low ADC (right). (b) Schematic illustration of multiple overlapping 2D-small-ROIs (9 pixels), which were automatically generated within the 3D-whole-lesion-ROI. The kernel mask was used to average the neighbor pixels to the center pixel. The reference value was determined as the minimum of the multiple

Computed-based calculation for median ADC values of the PZ parenchyma
For all patients (N = 466), the PZ of the prostate parenchyma was segmented using deep-learning assisted manual segmentation. A deep learning model with 3D-Unet (a fully convolutional network in biomedical image segmentation) was used to create preliminary zonal segmentation [18][19][20], and it was corrected by a radiologist in all cases. Median ADC values of the PZ for each patient were calculated. The median ADC value of the PZ was assumed to represent the ADC value of the normal prostate PZ parenchyma. The variability of the values was used as a surrogate indicator for assessing the influence of technical differences (scanner models and endorectal coil use) on ADC values.

Statistical analysis
Agreement of ADC values between readers was demonstrated by using Bland-Altman plots with 95% limits of agreement. Difference in 95% limits of agreement among measurement methods were assessed with Friedman test and post hoc tests with Wilcoxon rank sum test adjusted by Bonferroni-Holm method. Inter-observer agreement between measured ADC values by 2D-ROI and computer-calculated reference values was assessed using the intraclass correlation coefficient (ICC). The median ADC values of the PZ and lesions measured under different MRI acquisition conditions were compared by using unpaired t-test. P values less than 0.05 were considered statistically significant. Image and statistical analyses were performed using Python version 3.7 (Python Software Foundation) or R version 3.6.1 (The R Foundation).
Among the entire cohort of 613 lesions from 466 patients, 378 lesions were benign or GS 6 PCa, 235 were csPCa. 457 were in the PZ and 156 were in the TZ.
The results of the 3 reading sessions are summarized in Table 1  .001). The post-hoc test demonstrates statistically significant differences between session 1 and session 2 and between session 1 and session 3 (both P < .001), whereas there was no significant difference between session 2 and session 3 (P = .46). For the entire cohort, median ADC values of the PZ prostate parenchyma were compared under the different MRI acquisition conditions. The scanner models were Discovery MR 750w (GE Healthcare) (n = 275 patients, 349 lesions), Skyra (n = 144 patients, 195 lesions), Discovery MR 750 (GE Healthcare) (n = 38 patients, 59 lesions), Optima MR 450w (GE Healthcare)

Discussion
Our study showed that improvement of interobserver agreement of ADC measurement could be achieved by specifying the 2D-ROI measurement method. In session 2 of the multi-reader study, readers were instructed to use 9-pixel 2D-small-ROI (size specification) covering the lowest ADC area (position specification). The 95% limits of agreement of ADC values among readers were superior with the 2D-small-ROI method with 9 pixels covering the lowest ADC area (± 120 [10 −6 mm 2 /s]) compared to the free 2D-small-ROI method (± 205 [10 −6 mm 2 /s]) (P < .001). ICC between ADC values measured by each reader and computer-calculated reference values was also superior with the 2D-small-ROI method with 9 pixels covering the lowest ADC area (0.736-0.949) compared to the free 2D-small-ROI method (0.349-0.919). In our study, the minimum computer-calculated mean ADC values by 2D-small-ROI were used as representative values to correspond to mean ADC values measured by the 2D-small-ROI that covered the lowest ADC area. The computer-calculated mean ADC values increased as pixel size increased, ranging from 635 (10 −6 mm 2 /s) in 1 pixel to 840 (10 −6 mm 2 /s) in 49 pixels. This result suggest that larger 2D-ROI sizes will increase the measured ADC values. PCa is often heterogeneous, and a high-grade component may sparsely present within a low-grade component or non-neoplastic tissue [16,21]. ADC images probably reflect tumor heterogeneity, and using a 2D-ROI to cover the lowest ADC area likely captures the small focus of highergrade tumors. With larger ROIs, the more lower-grade components would be included, resulting in a higher mean ADC value.
We consider the 2D-small-ROI method with 9 pixels covering the lowest ADC area likely equivalent in interobserver agreement to the 3D-whole-lesion-ROI method with 10 th percentile. The 2D-small-ROI method with specific instruction achieved almost the same level of 95% limits of agreement among the readers as the 10 th percentile 3D-whole-lesion-ROI method (± 112 [10 −6 mm 2 /s]) (P = .46). When the 4 best radiologists' results were used, the 95% limits of agreement among the readers in the 2D-small-ROI method with 9 pixels covering the lowest ADC area were ± 86 (10 −6 mm 2 /s), slightly better than the 3D-whole-lesion-ROI method (P = .189). The 2D-small-ROI method relies on radiologists' detection of the small focus of the lowest ADC area. Although it is an intuitive task, careful placement of ROI could probably result in better interobserver agreement. For example, displaying images using a narrow window setting, enlarging images such that they appear pixelated, and placing the ROI on a few different sections may be necessary to find the lowest ADC area. Additionally, placing a small sub-2D-ROI in the lowest ADC area for targeted biopsy could increase the yield of higher grade disease in heterogeneous tumors.
Various factors other than 2D-ROI measurement method could affect ADC values. Previous reports demonstrated that ADC values were inconsistent among different MRI scanner models [13][14][15]22]. Our study compared median ADC values of the PZ of the prostate parenchyma among different scanner models (Skyra and Discovery MR 750w). ADC values measured on 1 model were lower than the other in all 3 regions; PZ of the prostate parenchyma (277 [10 −6 mm 2 /s), 27%), csPCa lesions (43 [10 −6 mm 2 /s), 6%) and non-csPCa lesions (95 [10 −6 mm 2 /s), 10%). A prior human study compared ADC values of upper abdominal organs between different scanner models, and demonstrated significant differencesin liver, pancreas and kidney, whereas spleen and gallbladder showed no significant difference [22]. Our study suggests that normal prostate gland could also be significantly affected by scanner model. Interestingly, mean ADC values of csPCa lesions with more restricted diffusion were not significant between the 2 models, whereas those of non-csPCa lesions with less restricted diffusion were significant. Similar results were found in a prior phantom study using 6 different scanner models, demonstrating that fluid with less restricted diffusion was affected by scanner model. The difference in ADC values of distilled water was 540 (10 −6 mm 2 /s) (range: 1885-2425 [10 −6 mm 2 /s]), which was greater than that of 25% sodium chloride of (314 [10 −6 mm 2 /s] [range: 1265-1579 {10 −6 mm 2 /s}]) [13]. Endorectal coil use also affected ADC values in our study, although previous reports showed no significant difference in the ADC of a lesion with or without endorectal coil use [23]. The choice of b values might affect ADC estimates as well [24,25]. Our study did not evaluate ADC values on different b-value sets since we assumed considerable confounding bias in scanner models used in our institution.
In our study, 95% limits of agreement of ADC values among the readers on the Bland-Altman plots were ±205 (10 −6 mm 2 /s) for the free 2D-small-ROI method, (session 1) and ±120 (10 −6 mm 2 /s) for the 2D-small-ROI method with 9 pixels covering the lowest ADC area (session 2). Our study was not able to directly compare 95% limits of agreement of ADC values among the scanner models since each lesion was imaged with only 1 scanner. However, we could refer to prior studies investigating ADC difference among scanners in the same area. In the previous phantom study of inter-scanner variability with 4 different fluids scanned by 6 different MRI models by 4 different vendors demonstrated the 95% limits of agreement of ADC values among the scanners were ±306 (10 −6 mm 2 /s) [13]. In this study, one particular model showed considerably higher values than the other 5 scanners. Among these five scanners, the 95% limits of agreement of ADC values were ±112 (10 −6 mm 2 /s) [13]. Sasaki et al. reported interscanner variability of ADC values up to 8% in gray and white matter in the same patients with different scanners [14]. Therefore, we can speculate that the influence of interobserver variability on measured ADC value is similar or larger than that of interscanner variability.
Our study had several limitations. First, our study was a retrospective analysis. Second, 3Dwhole-lesion-ROI for the computer-generated calculation was retrospectively set by 4 different radiologists, and the border of the lesion varied depending on the radiologists. Third, the number of lesions for the multireader study was relatively small (N = 40). However, 10 radiologists evaluated 40 lesions, so we assume that an appropriate number of data (N = 400) was used for calculating 95% confidence intervals. Moreover, it is assumed that each radiologist had a certain tendency in his or her ADC measurement method, and therefore a similar result would be expected even if the number of evaluated lesions was increased. Fourth, there should exist multivariable confounding factors in the analysis of interscanner difference. The purpose of this analysis was to demonstrate the approximate influence of scanner model difference on ADC values and see how it differed from interobserver variability. Finally, we used arbitrary 9-pixel 2D-small-ROI in this study and the optimal size of 2D-ROI to represent tumor aggressiveness remains an issue.
In conclusion, interobserver agreement of ADC values was superior with the 2D-small-ROI method with 9 pixels covering the lowest ADC area compared to the free 2D-small-ROI method.
Supporting information S1 File. Annonimized data for the all cohort. (XLSX) S2 File. Annonimized data for the reader study by 10 radiologists for 40 lesions. (XLSX)